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Abstract 

The diverse T4-like phages (7qL/afrowr/nae)irifect a vvlde array of grann-riegatlve bacterial hosts. Th 

phages is generally well conserved, most of the phylogenetically variable genes being grouped together in a series hyperplastic 
regions (HPRs) that are interspersed annong large blocks of conserved core genes. Recent evidence fronn a pair of closely related 
T4-like phages has suggested that snnall, composite terminator/promoter sequences (promotergariy stem loop [PeSLs]) were 
implicated in mediating the high levels of genetic plasticity by indels occurring within the HPRs. Here, we present the genome 
sequence analysis of two T4-like phages, PST (1 68 kb, 272 open reading frames [ORFs]) and nt-1 (248 kb, 405 ORFs). These two 
phages were chosen for comparative sequence analysis because, although they are closely related to phages that have been 
previously sequenced (T4and KVP40, respectively), they have different host ranges. In each case, one member of the pair infects a 
bacterial strain that is a human pathogen, whereas the other phage's host is a nonpathogen. Despite belonging to phylogenet- 
ically distant branches of the T4-likes, these pairs of phage have diverged from each other in part by a mechanism apparently 
involving PeSL-mediated recombination. This analysis confirms a role of PeSL sequences in the generation of genomic diversity by 
serving as a point of genetic exchange between otherwise unrelated sequences within the HPRs. Finally, the palette of divergent 
genes swapped by PeSL-mediated homologous recombination is discussed in the context of the PeSLs' potentially important role 
in facilitating phage adaption to new hosts and environments. 
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Introduction 

The diverse T4-like phages (formally called the Tquatrovirinae 
subfamily by Lavigne et al. [2009]) are widespread in the en- 
vironment (e.g., Jia et al. 2007; Comeau and Krisch 2008; 
Lopez-Bueno et al. 2009; Butina et al. 2010). These morpho- 
logically complex phages have large genomes, varying in size 
from 160 to 240 kb, that can be adapted to infect a wide 



range of gram-negative bacteria (Ackermann and Krisch 
1997). In the past decade, there has been considerable prog- 
ress in sequencing the genomes of various representatives of 
the T4-like phages (reviewed in Krisch and Comeau 2008 and 
Petrov et al. 201 0). A major focus of this effort was to under- 
stand the evolutionary diversity of this large and diverse yet 
coherent group of viruses. The picture that has emerged from 
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these studies is one of a common genome structure, with a 
well-conserved set of structural and replication genes grouped 
together in large contiguous blocks (modules). The evolution 
of this "core T4 genome" has been primarily vertical. 
However, interspersed between the conserved T4 core se- 
quences are a series of hyperplastic regions (HPRs), which 
vary greatly in gene number and content. The HPRs are subject 
to much more horizontal transfer than the core genome (Filee 
et al. 2006; Comeau et al. 2007; Petrov et al. 2010). At least 
some, and perhaps most, of the genes in HPRs encode ancillary 
and adaptive functions that allow the core genome to propa- 
gate in a widely varied set of hosts. The analysis of a very closely 
related pair of T4 phages, the coliphages phi1 and RB49, re- 
vealed the presence of series of compound promoter-termi- 
nator elements (promotereariy stem loops = PeSLs). Either 
homologous or site-specific recombination involving these 
well-conserved genetic regulatory sequences offered a 
simple, plausible explanation for some of the genetic hyper- 
plasticity within the HPRs (Arbiol et al. 2010). As a conse- 
quence of these results, and our recent analysis of the 
mechanism that shuffles the major host-range determinants 
within the tail-fiber adhesin genes (Trojet et al. 201 1 ), we have 
extended this analysis to additional closely related pairs of T4 
phages that infect bacterial hosts that are phylogenetically dis- 
tant. The phage PST, whose virion morphology is indistinguish- 
able from T4, was originally isolated on the pathogen Yersinia 
pseudotuberculosis {Knapp and Zwillenberg 1964). However, 
a preliminary analysis of its genome (Krisch HM, unpublished 
data) indicated that it was phylogenetically very closely related 
to T4, a coliphage. Similarly, phage nt-1 was isolated on the 
halotolerant host Vibrio natriegens (Zachary 1974), but its 
genome sequence was clearly very closely related (Tetart 
et al. 2001) to the phage KVP40 that was isolated on the 
pathogen V. parahaemoiyticus and also infects multiple 
Vibrio spp. (Matsuzaki et al. 1992; Miller et al. 2003a). Our 
notion was that an analysis of pairs of phages with differing 
host ranges could provide useful insights into the mecha- 
nism(s) and gene(s) responsible for their differences in host 
range. The additional pairs of phages analyzed were chosen 
to reflect the diversity with the T4-like group; one pair belong- 
ing to the T-even subgroup (the closest relatives of T4), 
whereas the other pair belongs to the much more evolution- 
arily distant Schizo-T-even subgroup that have a more elon- 
gated virion head and consequently a significantly larger 
genome size. This analysis suggests that, in spite of their phy- 
logenetic differences, both pairs of T4-type phages have un- 
dergone similar genomic changes mediated in part by PeSLs 
sequences similar to those discovered in the Pseudo-T-even 
phages RB49 and phil (Arbiol et al. 2010). Hence, this study 
both confirms and extends our previous proposal about the 
genetic mechanism responsible for some of the early events in 
T4 phage genome differentiation. 

High-throughput sequencing allowed us to quickly obtain 
single contiguous sequences for both the PST and nt-1 



genomes. These contigs could be closed to circles (fig. 1), 
consistent with the circularly permuted, terminally redundant 
structure of the T4 group's genomes (Casjens and Gilcrease 
2009). The PST genome is only approximately 1 kb smaller 
than T4 and both sequences share many conserved features 
(table 1) — similar GC levels, tRNAs (supplementary table SI, 
Supplementary Material online), and gene/ORF contents. In 
contrast, the nt-1 and KVP40 genomes have modest differ- 
ences in both their tRNA and gene/ORF contents. Both phage 
pairs have considerable overall nucleotide sequence conserva- 
tion, with PST/r4 having 84% of their genomes with more 
than or equal to 95% similarity (fig. 2, pink shading), whereas 
the nt-1/KVP40 pair diverges more, 82% of their genomes 
have an identity level of >66% (essentially equivalent to ig- 
noring the wobble position). In spite of their sequence 
divergence, the gene order and content of the paired ge- 
nomes are remarkably well conserved (supplementary fig. 
SI, Supplementary Material online). Much of the divergence 
between the paired genomes (fig. 1) is located in the HPRs 
where the variable genes/ORFs are generally grouped to- 
gether. For example, the HPR located in the first approximately 
60 kb of the nt-1/KVP40 pair (figs. 1 and 2) contains many of 
the differences in their gene/ORF content. Less dramatically, 
the PST/r4 pair is differentiated by only 55 genes and ORFs. 
Although the majority of these encode unknown functions, a 
few have been shown to be involved in host-range determi- 
nation. For example, in the tail-fiber adhesin locus (the major 
determinant of host specificity), PST has a typical gp38-type 
adhesin sequence (supplementary table S2, Supplementary 
Material online), whereas in T4, this adhesin function is 
encoded by an unrelated C-terminal domain of the adjacent 
gp37 long tail fiber (Trojet et al. 201 1). There are also differ- 
ences in this phage pair in their DNA modification systems 
(glycosylases/methylases), which could reflect differences in 
the specificities of Esclnericlnia and Yersinia restriction-modifi- 
cation systems. Finally, there are five differences between PST/ 
T4 in the internal protein (IP) sequences. The IPs are encapsi- 
dated in the virion head and injected upon infection along 
with the viral DNA. In at least some cases, these small proteins 
encode functions that negate host defensive mechanisms 
(Comeau and Krisch 2005; Bair et al. 2007). The nt-1/KVP40 
pair differs in approximately 125 genes/ORFs, the majority of 
which encode small proteins of unknown function with no 
matches in the databases (ORFans). Only four differential 
genes have currently identifiable potential roles in host- 
range determination — a transcription factor and dCMP deam- 
inase in KVP40, and two tRNA-modifying enzymes in nt-1 
(supplementary table S3, Supplementary Material online). 

As mentioned before, the examination of another closely 
related pair of T4-like genomes (RB49and phil) allowed us to 
identify the PeSL elements that were apparently closely asso- 
ciated with genome plasticity (Arbiol et al. 2010). The analysis 
of the genomes of the PST and nt-1 pairs revealed similar sets 
of PeSLs that differ slightly in sequence between them and 



1612 Genome Biol. Evol 6(7):161 1-1619. doi:10.1093/gbe/evu129 Advance Access publication June 19,2014 



PeSL Motifs Mediate Genome Shuffling in T4 Phages 



GBE 




Genome Biol. Evol. 6(7):161 1-1619. doi:10.1093/gbe/evu129 Advance Access publication June 19, 2014 



1613 



Comeau et al. 



GBE 



also from the RB49/phil pair (figs. 3 and 4). The stem-loop 
structures (SLSs) of the new PeSLs described here differ from 
the RB49/phi1 Gn-Ioop-Cn sequence: the nt-1 pair frequently 
have a consensus SLS sequence of As^Gs-Ioop-CbTs^, 
whereas the PST pair often has a AGnA-Ioop-TCnT sequence, 
but with greater sequence variability in their SLS sequences 
than the other pairs. The PeSLs motifs of nt-1 seem to be more 



Table 1 

Summary of Genome Characteristics of PST versus T4 and nt-1 versus 
KVP40 



Characteristic 


PST 


T4 


nt-1 


KVP40 


Genome size (nt) 


167,785 


168,903 


247,511 


244,834 


GC content (%) 


35.3 


35.3 


41.3 


42.6 


# tRNAs 


9 


8 


29 


30 


All ORFs/genes 










Total number 


271 


278 


405 


381 


Size range (aa) 


34^1,289 


26-1,289 


29-1,246 


36-1,256 


Mean size (aa) 


193 


197 


187 


194 


Median size (aa) 


131 


135 


126 


133 


Differential ORFs/genes 










Total number 


27 


28 


70 


54 


Potentially involved 


5 


7 


2 


2 


in host range 










Other known virus 






2 




functions 










Mobile elements 




7 


1 




Homologs of cellular 






5 


7 


functions 










Unidentified virus 


22 


14 


8 


2 


functions 










ORFans 






52 


43 



highly conserved than the others, suggesting that they may 
more efficiently promote recombinational shuffling than 
either the PST- or RB49-type PeSLs. The consensus promoter 
sequences in these PeSLs also differ from the near-perfect 
sequences (TTGACA . . . N17 . . . TATAAT) observed in the RB49/ 
phil pair: TTTACW . . . N17 . . . TAYWAT for the PST/r4 pair 
and TTGYVH . . . N17 • • • TAWWAT for the nt-1/KVP40 pair. 
Interestingly, nt-1 also has some PeSL-like sequences based 
on the T4-like middle-mode (Mot) promoter consensus (there- 
fore called promotermiddie stem loop [PmSLs]), with a TGCTT 
Mot-box regulator sequence followed by, at the appropriate 
distance, a -10-like box with the consensus sequence TATTAT 
(fig. 4; Miller et al. 2003b). As in the original RB49/phi1 pair, 
the new PeSLs are preferentially located in the HPRs (fig. 1) 
and frequently associated with nearby ORF(an)s that are dif- 
ferent between the pairs of phages. When located within the 
conserved core of the genome, they are often associated with 
particularly plastic nonessential adaptive genes. Figure 5 
shows some segments of the genomes that are particularly 
variable between the pairs of phages. It is clear that PeSLs are 
frequently found in close proximity to chromosomal insertions, 
deletions, or sequence exchanges. For example, in the nt-1 
genome between coordinates approximately 51-56 kb 
(fig. 5), there is an uninterrupted series of six ORFs with 
PeSL elements located in the intergenic spaces between 
them. Within this entire small PeSL-rich genomic segment, 
there have been numerous genetic rearrangements 
when compared with the KVP40 sequence: 5 insertions, 4 
replacements, 15 deletions, and an ORF displacement. 
Figure 6 shows, in detail, several typical examples of ORF ex- 
changes and deletions that were apparently mediated by 
PeSLs. 
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Fig. 2. — Whole-genome nucleotide level comparisons of PST to T4 (top) and nt-1 to KVP40 (bottom). 
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Conclusion 

Despite being from two divergent groups within the T4-like 
viruses, both PST and nt-1 reveal a similar pattern of se- 
quence variation when compared with their closest known 
relatives. PeSL elements, first identified in a yet another 
branch of the T4-like phages, appear to be responsible for 
a nontrivial portion of the initial genetic divergence that 
occurs within this diverse phage subfamily. Significantly, 
PeSL elements are located in close proximity to many of 
the divergent loci we have examined. For some, these 
PeSLs are close enough to each other (e.g., fig. 6A) to 
generate exchanges by a PeSL mini-circle formation mecha- 
nism, which has been previously demonstrated to delete the 
DNA interposed between two tandem PeSLs (Arbiol et al. 
2010). For other divergent loci where there remains only a 
single PeSL on one side of the ORF(s), either the distal 
flanking PeSL sequence has been lost by a deletion event 
or by a rare homologous recombination event occurring 
between small randomly homologous sequences of non- 
PeSL origin (Albertini et al. 1982). Such non-PeSL-mediated 
mechanisms offer a plausible explanation for the g49.2/.3 
exchange in PST (fig. 5). The frequency of PeSLs located in 
close proximity to each other is notably higher in the most 
variable genome segments. Such groupings of PeSLs are less 
frequent in genome regions containing the conserved virion 
structural genes, perhaps because in such regions the tran- 
scriptional units are generally longer, and there is strong 
coupling of expression of the different structural compo- 
nents that must be assembled together in the mature 
virion in precisely defined ratios. These observations, coupled 
with the previous evidence based on comparative genomics 
of exchange events occurring in or near the PeSL motifs 
(Arbiol et al. 2010), suggests that such elements play at 
least a contributory role to the creation of genetic diversity 
within the T4-like phages. The more "traditional" general 
homologous recombination and horizontal transfers of DNA 
adjacent to mobile elements (such as homing endonucleases 
common to these phages; Kadyrov et al. 1997; Brok- 
Volchanskaya et al. 2008) are clearly also involved, the 
latter (like the PeSL mechanism) is apparently more impor- 
tant in the less conserved, nonstructural regions of the 
genome. In such regions, we suggest that PeSL-mediated 
exchanges could make a nontrivial, but so far unrecognized, 
contribution to genome plasticity. 

It should be noted that some of the sequence diver- 
gences between the PST/r4 and nt-1/KVP40 pairs could ex- 
plain the differences in the host ranges of these closely 
related phages. As mentioned in our original observations 
on PeSLs (Arbiol et al. 2010), 20 years ago similar observa- 
tions had been made within the T4 IP locus (Repoila et al. 
1994), without detailing a mechanism, and it was 
hypothesized that a specific shuffling mechanism within 
the IP palette between related phages could allow major 
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Fig. 5. — Presence of multiple fully functional or degraded ("1/2") PeSLs/PmSLs near indels in HPRs of the PST (top) and nt-1 (bottom) genomes. The 
degraded PeSLs either have 1) only intact -35 regions remaining (for -43 K/51 K in nt-1) or 2) intact -35 regions and stem loops (no -10 s; for -47 K/51 K 
in PST); or 3) multiple weak/degraded copies of putative -35/- 10 regions inappropriately spaced (most probably caused by recombinational slippage) and 
degraded stem loops (for -53 K in nt-1) or with one or more still-intact stem loops (for -43 K/50 K in PST). 



modifications in host range. It is clear that, due to the ap- 
parently wide distribution of PeSLs, we can expand this idea 
to yet larger segments of the T4 genomes (i.e., the HPRs 
but usually not structural regions) and probably to all 
of the T4-like phages. Further experimentation on the 
targets and the mechanism of PeSL-mediated genome 
divergence is clearly merited. Such an effort will also be 
essential for us to understand the numerous and 
largely unknown gene functions that allow these phages 
to so easily adapt to an ever-changing suite of environments 
and hosts. 

Materials and Methods 

Bacteriophage DNA Preparation and Pyrosequencing 

The Yersinia phage PST and Vibrio phage nt-1 strains came 
from HMK's Toulouse collection of myoviruses. PST was orig- 
inally obtained from Dr Grimont of the Pasteur Institute in 
Paris, whereas nt-1 was obtained from the Felix d'Herelle 
Reference Center for Bacterial Viruses (HER150). Yersinia 
pseudotuberculosis NCTC 10275 and V. natriegens HER1138 
were the host bacteria used to prepare stocks of the phages 
using standard techniques as described by Carlson and Miller 
(1994). All strains were grown at 37 °C in Luria-Bertani 
medium. The DNA was extracted from high-titer stocks as 
detailed in Ackermann et al. (2011). The resulting pure 
DMAs were used for bar-coded library construction and 
454 pyrosequencing that were performed according to the 
manufacturer's instructions. PST was sequenced at the IBIS/ 
Universite Laval Plate-forme d'Analyses Genomiques 



(Quebec), and nt-1 was sequenced at the Broad 
Institute (www.broadinstitute.org/annotation/viral/Phage, last 
accessed June 22, 2014) as part of a Gordon and Betty Moore 
Foundation (GBMF) Marine Microbiology Initiative (www. 
moore.org/programs/science/marine-microbiology-initiative, 
last accessed June 22, 2014). 

Genome Assembly and Annotation 

Raw reads were assembled using the de novo GS Assembler 
(Roche), resulting in one final contig each for PST and nt-1 
with 54- and 30-fold coverage, respectively. Analyses of the 
genomes were done with the following programs: 1) 
GLIMMER (www.ncbi.nlm.nih.gov/genomes/MICROBES/glim 
mer_3.cgi, last accessed June 22, 2014; >100nt; bacterial 
genetic code) and GeneMark (exon.gatech.edu/GeneMark, 
last accessed June 22, 2014; heuristic approach for prokary- 
otes and viruses; >90nt) for ORF determinations; 2) tRNA 
search using tRNAscan-SE (lowelab.ucsc.edu/tRNAscan-SE, 
last accessed June 22, 2014); 3) Java Word Frequencies 
(athena.bioc.uvic.ca/virology-ca-tools/jfreq/, last accessed 
June 22, 2014) and Dot Plot Alignments (MIPS Gepard; 
www.helmholtz-muenchen.de/icb/gepard, last accessed 
June 22, 2014) for the exploration of DNA and protein 
"words'Vpatterns; 4) LAGAN (lagan.stanford.edu/lagan_ 
web/index.shtml, last accessed June 22, 2014) for the visual- 
ization/calculation of whole genome-to-genome nucleic-acid- 
level identities; 5) the BLAST tools at NCBI (blast.ncbi.nlm.nih. 
gov, last accessed June 22, 2014) for the characterization of 
genes/proteins and untranslated regions of the DNA; and 6) 
DNAPIotter for generating the circular genome visualizations 
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Fig. 6. — Examples of recombinational events causing the topology differences between the PST/r4 and KVP40/nt-1 phage pairs in which 
PeSLs are implicated. Deduced sites of recombinations are between the phages investigated here and a "last common ancestor" ("LCA"; may still 
be extant but not yet isolated and characterized). (A) The exchange of genes ndd.4/.5 in T4 (left) for ORF268/269 in PST (right) could have come 
about through PeSL mini-circle integration (as outlined in Arbiol et al. 2010) of different DNA cassettes from different donors into the same (or similar) 
LCA. (B) The deletion of ORF232 in nt-1 could have come about through PeSL mini-circle excision (Arbiol et al. 2010) from the LCA (left) and the 
resulting topology in KVP40 (right) achieved through loss/deletion of the downstream (potentially redundant) PeSL. Note here that ORF109/110 in nt-1 
are homologs of ORF23 1/233 in KVP40 and that the reverse recombination is also possible (PeSL mini-circle integration of ORF232 into nt-1 to generate 
KVP40/LCA). 



(www.sanger.ac.uk/resources/software/dnaplotter, last ac- 
cessed June 22, 2014). Specifically for the PeSL elements: 
Local BLASTn with previous PeSLs and self-on-self dot-plots 
easily initially detected the repeated elennents; then all inter- 
genic spaces of large-enough size were extracted and 
aligned to confirnn the pronnoter elennents and detect 
the SLSs. 



Nucleotide Sequence Accession Numbers 

The complete genome sequences of PST and nt-1 were de- 
posited in GenBank under accession numbers KF208315 
and HQ317393, respectively. According to the GBMF 



guidelines, phage nt-1 was also deposited in CAMERA 
(camera.calit2.net, last accessed June 22, 2014). 

Supplementary Material 

Supplementary tables S1-S3 and figure SI are available at 
Genome Biology and Evolution online (http://www.gbe. 
oxfordjournals.org/). 
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